-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ANR due to the tokio runtime being blocked by getaddrinfo
when dropped.
#7210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 23 of 23 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 23 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 23 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 23 files at r1.
Reviewable status: 18 of 24 files reviewed, 7 unresolved discussions (waiting on @albin-mullvad, @dlon, @Pururun, and @Rawa)
talpid-core/src/connectivity_listener.rs
line 109 at r1 (raw file):
sender: UnboundedSender<Connectivity>, ) -> Result<(), Error> { let sender_ptr = Box::into_raw(Box::new(sender)) as jlong;
The lifetime of this pointer should be clearly documented
talpid-core/src/connectivity_listener.rs
line 204 at r1 (raw file):
.map_err(|cause| Error::FindMethod("ConnectivityListener", method, cause))?; env.call_method_unchecked(self.object.as_obj(), method_id, return_type, parameters)
How come we're using call_method_unchecked instead of the safe version?
talpid-core/src/connectivity_listener.rs
line 220 at r1 (raw file):
let connected = JNI_TRUE == connected; let sender = unsafe { Box::from_raw(sender_address as *mut UnboundedSender<Connectivity>) };
Should have // SAFETY
-reasoning.
Might also be worth to enable the undocumented_unsafe_blocks
clippy lint for this module
mullvad-api/src/lib.rs
line 326 at r1 (raw file):
Box::pin(async move { let addrs = tokio::task::spawn_blocking(move || (host, 0).to_socket_addrs())
Maybe comment on why we do this thread dance.
mullvad-daemon/src/lib.rs
line 617 at r1 (raw file):
Ok(listener) => listener, Err(error) => { log::warn!(
Nit: This seems like a fatal error, log::error
is more appropriate, no?
mullvad-daemon/src/android_dns.rs
line 1 at r1 (raw file):
#![cfg(target_os = "android")]
btw why is this defined in Daemon
and not mullvad_api
where the trait is? Because it depends on talpid-core?
mullvad-daemon/src/android_dns.rs
line 4 at r1 (raw file):
//! A non-blocking DNS resolver. `getaddrinfo` tends to prevent the tokio runtime from being //! dropped, since it waits indefinitely on blocking threads. This is particularly bad on Android, //! so we use a non-blocking resolver instead.
Minor: When there's a module that exports a single type, I personally prefer putting the docs on the type itself, not the module. Either way though, I think the module docs or the struct docs should link to each other where the real docs are, i.e:
/// See [module docs](self)
pub struct AndroidDnsResolver {}
// or
mod foo {
//! See [Thingy].
pub struct Thingy;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 23 files at r1, 5 of 6 files at r2.
Reviewable status: 23 of 24 files reviewed, 12 unresolved discussions (waiting on @dlon and @hulthe)
talpid-core/src/connectivity_listener.rs
line 220 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
Should have
// SAFETY
-reasoning.Might also be worth to enable the
undocumented_unsafe_blocks
clippy lint for this module
+1
mullvad-daemon/src/lib.rs
line 623 at r2 (raw file):
return Err(Error::DaemonUnavailable); } };
⛏️ std::result::Result::inspect_err + std::result::Result::map_err
#[cfg(target_os = "android")]
let connectivity_listener = match ConnectivityListener::new(android_context.clone())
.inspect_err(|err| {
log::warn!(
"{}",
error.display_chain_with_msg("Failed to start connectivity listener")
);
})
.map_err(|_| Error::DaemonUnavailable)?;
Code quote:
#[cfg(target_os = "android")]
let connectivity_listener = match ConnectivityListener::new(android_context.clone()) {
Ok(listener) => listener,
Err(error) => {
log::warn!(
"{}",
error.display_chain_with_msg("Failed to start connectivity listener")
);
return Err(Error::DaemonUnavailable);
}
};
mullvad-daemon/src/android_dns.rs
line 16 at r2 (raw file):
pub struct AndroidDnsResolver { connectivity_listener: talpid_core::connectivity_listener::ConnectivityListener, }
⛏️ Import talpid_core::connectivity_listener::ConnectivityListener
, do not qualify the ConnectivityListener
type
Code quote:
pub struct AndroidDnsResolver {
connectivity_listener: talpid_core::connectivity_listener::ConnectivityListener,
}
mullvad-daemon/src/android_dns.rs
line 38 at r2 (raw file):
format!("Failed to retrieve current servers: {err}"), ) })?;
Tip: Use std::io::Error::other
Code quote:
let ips = ips.map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("Failed to retrieve current servers: {err}"),
)
})?;
mullvad-daemon/src/android_dns.rs
line 46 at r2 (raw file):
let lookup = resolver.lookup_ip(host).await.map_err(|err| { io::Error::new(io::ErrorKind::Other, format!("lookup_ip failed: {err}")) })?;
Tip: Use std::io::Error::other
Code quote:
let lookup = resolver.lookup_ip(host).await.map_err(|err| {
io::Error::new(io::ErrorKind::Other, format!("lookup_ip failed: {err}"))
})?;
mullvad-jni/src/lib.rs
line 126 at r2 (raw file):
_ = context.runtime.block_on(context.running_daemon); // Dropping the tokio runtime will block if there are any tasks in flight.
Could we specify what "in flight" means in this context? I suppose tasks refer to tokio tasks, and if so, the tokio runtime will fail to drop if any currently running task never yields at an await point (i.e. if it performs blocking I/O). WDYT?
Code quote:
// Dropping the tokio runtime will block if there are any tasks in flight.
mullvad-api/src/https_client_with_sni.rs
line 422 at r1 (raw file):
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "Empty DNS response"))?; Ok(SocketAddr::new(*addr, port.unwrap_or(DEFAULT_PORT))) }
This many early returns warrants a doc-comment imo. If someone changes this functions, what invariants do they need to uphold? I.e. how is the sought address resolved.
I know that it coould be considered out of scope for this PR, but it is also a good oppurtunity to apply the boy scout rule 😊
Code quote:
async fn resolve_address(
address_cache: AddressCache,
dns_resolver: &dyn DnsResolver,
uri: Uri,
) -> io::Result<SocketAddr> {
const DEFAULT_PORT: u16 = 443;
let hostname = uri.host().ok_or_else(|| {
io::Error::new(io::ErrorKind::InvalidInput, "invalid url, missing host")
})?;
let port = uri.port_u16();
if let Ok(addr) = hostname.parse::<IpAddr>() {
return Ok(SocketAddr::new(addr, port.unwrap_or(DEFAULT_PORT)));
}
// Preferentially, use cached address.
//
if let Some(addr) = address_cache.resolve_hostname(hostname).await {
return Ok(SocketAddr::new(
addr.ip(),
port.unwrap_or_else(|| addr.port()),
));
}
// Use DNS resolution as fallback
//
let addrs = dns_resolver.resolve(hostname.to_owned()).await?;
let addr = addrs
.get(0)
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "Empty DNS response"))?;
Ok(SocketAddr::new(*addr, port.unwrap_or(DEFAULT_PORT)))
}
mullvad-problem-report/src/lib.rs
line 303 at r2 (raw file):
) .await .map_err(Error::CreateRpcClientError)?;
Code smell? 😉
Code quote:
let api_runtime = mullvad_api::Runtime::with_cache(
// This is irrelevant since no DNS lookups will be made
DefaultDnsResolver,
cache_dir,
false,
#[cfg(target_os = "android")]
None,
)
.await
.map_err(Error::CreateRpcClientError)?;
mullvad-api/src/lib.rs
line 364 at r1 (raw file):
handle: tokio::runtime::Handle, dns_resolver: impl DnsResolver, ) -> Result<Self, Error> {
Strictly speaking, do we always need to supply a DnsResolver
when creating a new api runtime? Couldn't we use DefaultDnsResolver
by default and expose the option to configure the DnsResolver
for an existing runtime on-the-fly?
Code quote:
pub fn new(
handle: tokio::runtime::Handle,
dns_resolver: impl DnsResolver,
) -> Result<Self, Error> {
mullvad-api/src/lib.rs
line 325 at r2 (raw file):
Ok(addrs.map(|addr| addr.ip()).collect()) } }
Document what mechanism that is used for DNS lookups in this implementation (https://doc.rust-lang.org/std/net/trait.ToSocketAddrs.html#tymethod.to_socket_addrs)
Code quote:
pub struct DefaultDnsResolver;
#[async_trait]
impl DnsResolver for DefaultDnsResolver {
async fn resolve(&self, host: String) -> io::Result<Vec<IpAddr>> {
use std::net::ToSocketAddrs;
let addrs = tokio::task::spawn_blocking(move || (host, 0).to_socket_addrs())
.await
.expect("DNS task panicked")?;
Ok(addrs.map(|addr| addr.ip()).collect())
}
}
mullvad-daemon/Cargo.toml
line 55 at r2 (raw file):
android_logger = "0.8" async-trait = "0.1" hickory-resolver = { version = "0.24.1" }
⛏️ Promote to a workspace dep?
Code quote:
hickory-resolver = { version = "0.24.1" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 27 files reviewed, 9 unresolved discussions (waiting on @hulthe, @MarkusPettersson98, and @Rawa)
mullvad-api/src/https_client_with_sni.rs
line 422 at r1 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
This many early returns warrants a doc-comment imo. If someone changes this functions, what invariants do they need to uphold? I.e. how is the sought address resolved.
I know that it coould be considered out of scope for this PR, but it is also a good oppurtunity to apply the boy scout rule 😊
Done. I want to refactor AddressCache
into something that implements DnsResolver
and falls back on some other DnsResolver
if something isn't in its "cache". But that's definitely scope creep.
mullvad-api/src/lib.rs
line 326 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
Maybe comment on why we do this thread dance.
Done.
mullvad-api/src/lib.rs
line 364 at r1 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
Strictly speaking, do we always need to supply a
DnsResolver
when creating a new api runtime? Couldn't we useDefaultDnsResolver
by default and expose the option to configure theDnsResolver
for an existing runtime on-the-fly?
I slightly prefer the explicit way since it forces the caller to select an appropriate resolver.
mullvad-api/src/lib.rs
line 325 at r2 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
Document what mechanism that is used for DNS lookups in this implementation (https://doc.rust-lang.org/std/net/trait.ToSocketAddrs.html#tymethod.to_socket_addrs)
Done.
mullvad-daemon/src/android_dns.rs
line 1 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
btw why is this defined in
Daemon
and notmullvad_api
where the trait is? Because it depends on talpid-core?
Yes, depending on all of talpid-core
seems excessive. I am tempted to move ConnectivityListener
out of talpid-core
, though.
mullvad-daemon/src/android_dns.rs
line 4 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
Minor: When there's a module that exports a single type, I personally prefer putting the docs on the type itself, not the module. Either way though, I think the module docs or the struct docs should link to each other where the real docs are, i.e:
/// See [module docs](self) pub struct AndroidDnsResolver {} // or mod foo { //! See [Thingy]. pub struct Thingy; }
Nice. I'll adopt that convention.
mullvad-daemon/src/lib.rs
line 617 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
Nit: This seems like a fatal error,
log::error
is more appropriate, no?
Fixed!
mullvad-daemon/src/lib.rs
line 623 at r2 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
⛏️ std::result::Result::inspect_err + std::result::Result::map_err
#[cfg(target_os = "android")] let connectivity_listener = match ConnectivityListener::new(android_context.clone()) .inspect_err(|err| { log::warn!( "{}", error.display_chain_with_msg("Failed to start connectivity listener") ); }) .map_err(|_| Error::DaemonUnavailable)?;
👍
mullvad-jni/src/lib.rs
line 126 at r2 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
Could we specify what "in flight" means in this context? I suppose tasks refer to tokio tasks, and if so, the tokio runtime will fail to drop if any currently running task never yields at an await point (i.e. if it performs blocking I/O). WDYT?
Added some more information. This is mostly a reminder in case we end up in the same situation.
mullvad-problem-report/src/lib.rs
line 303 at r2 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
Code smell? 😉
Replaced with a null resolver.
talpid-core/src/connectivity_listener.rs
line 109 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
The lifetime of this pointer should be clearly documented
Done. (This is no longer relevant.)
talpid-core/src/connectivity_listener.rs
line 204 at r1 (raw file):
Previously, hulthe (Joakim Hulthe) wrote…
How come we're using call_method_unchecked instead of the safe version?
unchecked
is faster since the class isn't looked up each time a method is called: https://docs.rs/jni/latest/jni/struct.JNIEnv.html#checked-and-unchecked-methods.
I switched to the simpler safe method since I doubt it matters here.
talpid-core/src/connectivity_listener.rs
line 220 at r1 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
+1
Done. (No longer relevant.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r2, 11 of 14 files at r3, 3 of 4 files at r4.
Reviewable status: 24 of 28 files reviewed, 8 unresolved discussions (waiting on @dlon, @hulthe, and @Rawa)
mullvad-api/src/lib.rs
line 364 at r1 (raw file):
Previously, dlon (David Lönnhager) wrote…
I slightly prefer the explicit way since it forces the caller to select an appropriate resolver.
Sure :)
talpid-core/src/connectivity_listener.rs
line 109 at r1 (raw file):
Previously, dlon (David Lönnhager) wrote…
Done. (This is no longer relevant.)
🙌
mullvad-api/src/https_client_with_sni.rs
line 392 at r4 (raw file):
/// Resolve the provided `uri` to an IP and port using `address_cache` in the first place, and /// using `dns_resolver` otherwise.
Isn't this a bit misleading as address_cache
might not be used if if let Ok(addr) = hostname.parse::<IpAddr>() { return Ok(..) }
is hit?
Code quote:
/// Resolve the provided `uri` to an IP and port using `address_cache` in the first place, and
/// using `dns_resolver` otherwise.
talpid-core/src/connectivity_listener.rs
line 170 at r4 (raw file):
// No sender has been registered return; };
It is probably wise to log if this branch is ever hit. trace
or debug
should be fine
Code quote:
let Some(tx) = &*CONNECTIVITY_TX.lock().unwrap() else {
// No sender has been registered
return;
};
mullvad-daemon/src/android_dns.rs
line 35 at r4 (raw file):
let ips = ips.map_err(|err| { io::Error::other(format!("Failed to retrieve current servers: {err}")) })?;
⛏️ Drop one of the ips
bindings
Code quote (i):
let ips = self.connectivity_listener.current_dns_servers();
let ips = ips.map_err(|err| {
io::Error::other(format!("Failed to retrieve current servers: {err}"))
})?;
Code snippet (ii):
let ips = self
.connectivity_listener
.current_dns_servers()
.map_err(|err| {
io::Error::other(format!("Failed to retrieve current servers: {err}"))
})?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 21 of 28 files reviewed, 7 unresolved discussions (waiting on @hulthe, @MarkusPettersson98, and @Rawa)
mullvad-api/src/https_client_with_sni.rs
line 392 at r4 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
Isn't this a bit misleading as
address_cache
might not be used ifif let Ok(addr) = hostname.parse::<IpAddr>() { return Ok(..) }
is hit?
Done.
talpid-core/src/connectivity_listener.rs
line 170 at r4 (raw file):
Previously, MarkusPettersson98 (Markus Pettersson) wrote…
It is probably wise to log if this branch is ever hit.
trace
ordebug
should be fine
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: 25 of 28 files reviewed, 6 unresolved discussions (waiting on @dlon, @hulthe, and @Rawa)
mullvad-api/src/https_client_with_sni.rs
line 392 at r4 (raw file):
Previously, dlon (David Lönnhager) wrote…
Done.
But the comment is still missleading, isn't it? It implies that address_cache
will be used, but it might not. Consider talking about URI first, and then get to address_cache
and dns_resolver
😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: 25 of 28 files reviewed, 5 unresolved discussions (waiting on @hulthe and @Rawa)
8c260f4
to
4a60f97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r2, 9 of 14 files at r3, 3 of 4 files at r4, 1 of 4 files at r5, 2 of 3 files at r6, 1 of 1 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: 25 of 28 files reviewed, all discussions resolved (waiting on @Rawa)
mullvad-daemon/src/android_dns.rs
line 1 at r1 (raw file):
Previously, dlon (David Lönnhager) wrote…
Yes, depending on all of
talpid-core
seems excessive. I am tempted to moveConnectivityListener
out oftalpid-core
, though.
Yeah, ´talpid-corefeels like a catch-all library.
talpid-net? or maybe it's time for
talpid-android`? Food for thought, but this is not a blocker imo.
4a60f97
to
12e8251
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r2, 9 of 14 files at r3, 2 of 4 files at r4, 4 of 4 files at r5, 2 of 3 files at r6, 1 of 1 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved
12e8251
to
ad2fc60
Compare
Use
hickory-dns
to resolve{ipv4,ipv6}.am.i.mullvad.net
, which is non-blocking, instead of GAI.Fix DES-1409.
This change is